AITopics | bandit learning algorithm and application

Collaborating Authors

bandit learning algorithm and application

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Bandit Learning Algorithm and Applications to Auction Design

Neural Information Processing SystemsDec-24-2025, 06:48:26 GMT

We consider online bandit learning in which at every time step, an algorithm has to make a decision and then observe only its reward. The goal is to design efficient (polynomial-time) algorithms that achieve a total reward approximately close to that of the best fixed decision in hindsight. In this paper, we introduce a new notion of $(\lambda,\mu)$-concave functions and present a bandit learning algorithm that achieves a performance guarantee which is characterized as a function of the concavity parameters $\lambda$ and $\mu$. The algorithm is based on the mirror descent algorithm in which the update directions follow the gradient of the multilinear extensions of the reward functions. The regret bound induced by our algorithm is $\widetilde{O}(\sqrt{T})$ which is nearly optimal.

algorithm, bandit learning algorithm and application, maximization, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Review for NeurIPS paper: A Bandit Learning Algorithm and Applications to Auction Design

Neural Information Processing SystemsJan-26-2025, 11:45:47 GMT

Additional Feedback: The paper studies the problem of online convex optimization problem, except that the functions that arrive online are not really concave. They are "close to" concave, formalized by the paper as (lambda, mu) concavity. The idea is that in many problems of interest where the input functions are not concave, the paper discretizes the function and consider the multilinear extension of the discretized function, which happens to be (lambda, mu) concave for reasonable values of lambda and mu. The paper presents three applications to illustrate the value of their approach. The first of these is the analysis of adaptive dynamics on (lambda, mu) smooth games where previously high welfare was known to be guaranteed (i.e., average welfare of playing the dynamics over time is at least lambda/mu of the optimal welfare) only for dynamics that had vanishing regret for each player.

auction design, bandit learning algorithm and application, welfare, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Review for NeurIPS paper: A Bandit Learning Algorithm and Applications to Auction Design

Neural Information Processing SystemsJan-26-2025, 11:45:39 GMT

This paper is interesting, at the junction of bandits and auctions. One reviewer had strong concerns, but they were mostly due to a (bad) choice of fonts and they were lifter after the rebuttal.

auction design, bandit learning algorithm and application, neurips paper

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

A Bandit Learning Algorithm and Applications to Auction Design

Neural Information Processing SystemsOct-10-2024, 18:15:43 GMT

We consider online bandit learning in which at every time step, an algorithm has to make a decision and then observe only its reward. The goal is to design efficient (polynomial-time) algorithms that achieve a total reward approximately close to that of the best fixed decision in hindsight. In this paper, we introduce a new notion of (\lambda,\mu) -concave functions and present a bandit learning algorithm that achieves a performance guarantee which is characterized as a function of the concavity parameters \lambda and \mu . The algorithm is based on the mirror descent algorithm in which the update directions follow the gradient of the multilinear extensions of the reward functions. The regret bound induced by our algorithm is \widetilde{O}(\sqrt{T}) which is nearly optimal.

algorithm, bandit learning algorithm and application, maximization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback